Principal component analysis

A technique to find the axes that explain the largest variation in the data.

See also Singular value decomposition.



How to



  1. sqrt any count features. log any heavy tailed features. PCA prefers things that are “homoscedastic” (which is my favorite word to ASMR and I literally do it in class) sqrt and log are “variance stabilizing transformations”.
  2. localization is noise. regularize when you normalize.
    1. if you make a histogram of a component (or loading) vector and it has really big outliers, that is localization. It’s bad. It means the vector is noise.
    2. diagnostic:
    3. To address localization, I would suggest normalizing by regularized row/column sums. This works like fucking magic. Not even kidding. D_r = Diagonal(1/ sqrt(rs + mean(rs)); D_c = Diagonal(1/ sqrt(cs + mean(cs)). Do SVD on D_r A D_c.
      1. paper: Zhang2018understanding
      2. youtube:
  3. and my favorite rule, the Cheshire cat rule - “One day Alice came to a fork in the road and saw a Cheshire cat in a tree. ‘Which road do I take?’ she asked. ‘Where do you want to go?’ was his response. ‘I don’t know,’ Alice answered. ‘Then,’ said the cat, ‘it doesn’t matter.”


